Multilabel Subject-Based Classification of Poetry

نویسندگان

  • Andres Lou
  • Diana Inkpen
  • Chris Tanasescu
چکیده

Oftentimes, the question “what is this poem about?” has no trivial answer, regardless of length, style, author, or context in which the poem is found. We propose a simple system of multi-label classification of poems based on their subjects following the categories and subcategories as laid out by the Poetry Foundation. We make use of a model that combines the methodologies of tf-idf and Latent Dirichlet Allocation for feature extraction, and a Support Vector Machine model for the classification task. We determine how likely it is for our models to correctly classify each poem they read into one or more main categories and subcategories. Our contribution is, thus, a new method to automatically classify poetry given a set and various subsets of categories.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Analysis of Bangla Poetry for Classification and Poet Identification

Computational analysis of poetry is a challenging and interesting task in NLP. Human expertise on stylistics and aesthetics of poetry is generally expensive and scarce. In this work, we delve into the data to automatically extract stylistic and linguistic information which are useful for analysis and comparison of poems. We make use of semantic (word) features to perform subject-based classific...

متن کامل

Combining Instance-Based Learning and Logistic Regression for Multilabel Classification (Resubmission)∗

Multilabel classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Recent research has shown that, just like for standard classification, instance-based learning algorithms relying on the nearest neighbor estimation principle can be used quite successfully in this context. However, since hitherto existing algorithms do not...

متن کامل

Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification

In most classification problems, a classifier assigns a single class to each instance and the classes form a flat (non-hierarchical) structure, without superclasses or subclasses. In hierarchical multilabel classification problems, the classes are hierarchically structured, with superclasses and subclasses, and instances can be simultaneously assigned to two or more classes at the same hierarch...

متن کامل

Case-Based Multilabel Ranking

We present a case-based approach to multilabel ranking, a recent extension of the well-known problem of multilabel classification. Roughly speaking, a multilabel ranking refines a multilabel classification in the sense that, while the latter only splits a predefined label set into relevant and irrelevant labels, the former furthermore puts the labels within both parts of this bipartition in a t...

متن کامل

MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation

Learning from imbalanced data is a problem which arises in many real-world scenarios, so does the need to build classifiers able to predict more than one class label simultaneously (multilabel classification). Dealing with imbalance by means of resampling methods is an approach that has been deeply studied lately, primarily in the context of traditional (non-multilabel) classification. In this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015